Create sarif-splitter plugin to split SARIF files by categories#20
Create sarif-splitter plugin to split SARIF files by categories#20Copilot wants to merge 5 commits into
Conversation
Co-authored-by: felickz <1760475+felickz@users.noreply.github.com>
Co-authored-by: felickz <1760475+felickz@users.noreply.github.com>
Co-authored-by: felickz <1760475+felickz@users.noreply.github.com>
|
do not allow mutliple different splitting in one pass - its one technique only at a time. When you do your testing, lets focus on the severity test with https://github.com/advanced-security/sarif-toolkit/blob/main/examples/input-example.sarif - this has critical, highs, mediums and a sarif file should be generated for each. Further, inspect those files to ensure the expected number of alerts is present in each. Why not create some unit tests for this! Since it will be writing files, ensure each test cleans up after and is not running in parallel. Ensure the category is as expected as this is critical in each file. Add a summary output to the tool that iterates a before / after view. A small table that will have: Sarif file name/ # alerts / category name for each! |
…d unit tests, and summary output Co-authored-by: felickz <1760475+felickz@users.noreply.github.com>
Implemented all requested feedback in commit 128847b: ✅ Single splitting method restriction: Plugin now rejects multiple methods with clear error message Testing with input-example.sarif and comprehensive test:
Category validation: All files have correct Perfect alert preservation: Total input alerts = Total output alerts (no loss). |
This PR implements a new sarif-splitter plugin that addresses the need to split large SARIF files into smaller, categorized files for better organization and to overcome GitHub Advanced Security upload restrictions.
Problem Solved
Large SARIF files can exceed GitHub's upload size limits and make it difficult to organize security alerts effectively. The new splitter plugin enables teams to:
Key Features
Path-Based Splitting
Split alerts based on file path patterns using glob matching:
Default path categories:
Tests:**/test/**,**/tests/**,**/*test*App:**/web/**,**/api/**,**/src/**,**/app/**Severity-Based Splitting
Split alerts by security severity levels automatically extracted from SARIF rule properties:
Severity mapping:
Single Splitting Method Restriction
The plugin enforces that only one splitting method can be used at a time. Users must choose either
--split-by-pathOR--split-by-severity, not both, to ensure focused and predictable splitting behavior.GitHub Advanced Security Integration
Each split SARIF file includes proper
runAutomationDetails.idcategories following GitHub's conventions:/language:python/category:Tests,/language:python/filter:none/language:python/severity:Critical,/language:python/severity:High,/language:python/severity:Medium,/language:python/severity:LowSummary Output Table
The plugin provides a comprehensive summary table showing before/after views:
Configurable Rules
Custom splitting rules via JSON configuration files:
{ "path_rules": [ { "name": "Frontend", "patterns": ["**/web/**", "**/*.js", "**/*.jsx"] }, { "name": "Backend", "patterns": ["**/api/**", "**/*.py", "**/*.java"] } ] }Technical Implementation
SARIF Model Enhancement
AutomationDetailsModelto support GitHub Advanced Security categoriesRunsModelto includeautomationDetailsfieldRobust Property Access
The plugin handles various SARIF property formats for security-severity extraction:
No Alert Loss Guarantee
All alerts are preserved through fallback categories:
/language:<lang>/filter:none/language:<lang>/severity:OthersUsage Examples
Basic splitting (single method only):
# Split by severity levels only python -m sariftoolkit --enable-splitter \ --split-by-severity \ --language javascript --sarif scan-results.sarif \ --output ./categorized-resultsCustom configuration:
Bug Fixes
This PR also fixes an existing dataclass configuration bug that was preventing the toolkit from running:
Testing
The implementation has been thoroughly tested with:
runAutomationDetails.idformatting for GitHub Advanced SecurityAll generated SARIF files maintain complete metadata while properly categorizing alerts for improved dashboard organization with zero alert loss.
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.